This paper investigates methods for quantifying similarity between audiosignals, specifically for the task of of cover song detection. We consider aninformation-theoretic approach, where we compute pairwise measures ofpredictability between time series. We compare discrete-valued approachesoperating on quantised audio features, to continuous-valued approaches. In thediscrete case, we propose a method for computing the normalised compressiondistance, where we account for correlation between time series. In thecontinuous case, we propose to compute information-based measures of similarityas statistics of the prediction error between time series. We evaluate ourmethods on two cover song identification tasks using a data set comprised of300 Jazz standards and using the Million Song Dataset. For both datasets, weobserve that continuous-valued approaches outperform discrete-valuedapproaches. We consider approaches to estimating the normalised compressiondistance (NCD) based on string compression and prediction, where we observethat our proposed normalised compression distance with alignment (NCDA)improves average performance over NCD, for sequential compression algorithms.Finally, we demonstrate that continuous-valued distances may be combined toimprove performance with respect to baseline approaches. Using a large-scalefilter-and-refine approach, we demonstrate state-of-the-art performance forcover song identification using the Million Song Dataset.
展开▼
机译:本文研究了量化音频信号之间相似性的方法,特别是针对翻唱歌曲检测的任务。我们考虑一种信息理论方法,其中我们计算时间序列之间的可预测性的成对度量。我们将对量化音频特征进行操作的离散值方法与连续值方法进行比较。在离散情况下,我们提出了一种计算归一化压缩距离的方法,其中考虑了时间序列之间的相关性。在连续情况下,我们建议计算基于信息的相似性度量,作为时间序列之间的预测误差的统计量。我们使用由300个爵士标准组成的数据集和“百万首歌曲”数据集来评估我们在两种翻唱歌曲识别任务上的方法。对于这两个数据集,我们观察到连续值方法优于离散值方法。我们考虑了基于字符串压缩和预测来估计归一化压缩距离(NCD)的方法,其中观察到我们提出的带对齐的归一化压缩距离(NCDA)改进了顺序压缩算法的NCD的平均性能。最后,我们证明了连续值距离可以组合起来以提高相对于基线方法的性能。使用大规模的筛选和优化方法,我们演示了使用Million Song Dataset识别歌曲的最新性能。
展开▼